Rating a video based on its content is an important step for classifying video age categories. Movie content rating and TV show rating are the two most common rating systems established by professional committees. However, manually reviewing and evaluating scene/film content by a committee is a tedious work and it becomes increasingly difficult with the ever-growing amount of online video content. As such, a desirable solution is to use computer vision based video content analysis techniques to automate the evaluation process. In this paper, related works are summarized for action recognition, multi-modal learning, movie genre classification, and sensitive content detection in the context of content moderation and movie content rating. The project page is available at https://github.com/fcakyon/content-moderation-deep-learning}.
translated by 谷歌翻译
由于无人机成本降低并且无人机技术有所改善,无人机检测已成为对象检测的重要任务。但是,当对比度较弱,远距离可见度较弱时,很难检测到遥远的无人机。在这项工作中,我们提出了几个序列分类体系结构,以减少无人机轨道检测到的假阳性比率。此外,我们提出了一个新的无人机与鸟类序列分类数据集,以训练和评估拟议的架构。3D CNN,LSTM和基于变压器的序列分类体系结构已在拟议的数据集上进行了培训,以显示提出的思想的有效性。如实验所示,使用序列信息,鸟类分类和整体F1分数可以分别提高73%和35%。在所有序列分类模型中,基于R(2+1)D的完全卷积模型可产生最佳的转移学习和微调结果。
translated by 谷歌翻译
在现场遥远的小物体和物体的检测是监视应用中的一个重大挑战。此类对象由图像中的少量像素表示,并且缺乏足够的细节,因此很难使用常规检测器检测到它们。在这项工作中,提出了一个称为切片辅助超推理(SAHI)的开源框架,该框架提供了一种通用切片的辅助推理和用于小对象检测的微调管道。提出的技术是通用的,因为它可以在任何可用的对象检测器之上应用于而无需任何微调。实验评估,使用对象检测基线在Visdrone和Xview Aerial对象检测数据集上表明,FCO,VFNET和TOOD检测器分别将对象检测方法分别增加6.8%,5.1%和5.3%。此外,通过切片辅助微调可以进一步提高检测准确性,从而导致累计增加12.7%,13.4%和14.5%的AP按照相同的顺序。拟议的技术已与DestectRon2,MMDetection和Yolov5模型集成在一起,并在https://github.com/obss/sahi.git上公开获得。
translated by 谷歌翻译
随着无人机的使用随着成本降低和改善的无人机技术而增加,无人机检测作为一个重要的对象检测任务。然而,在不利的条件下检测远处无人机,即弱对比度,远程,低可视性,需要有效的算法。我们的方法通过使用基于卡尔曼的对象跟踪器微调使用基于Kalman的对象跟踪器来提高yolov5模型来通过微调yolov5模型来接近无人机检测问题,以提高检测信心。我们的结果表明,通过最佳的合成数据子集增强真实数据可以提高性能。此外,由对象跟踪方法收集的时间信息可以进一步提高性能。
translated by 谷歌翻译
虽然考试风格的问题是一家提供各种目的的基本型教育工具,但有问题的手动构建是一个复杂的过程,需要培训,经验和资源。为减少与人工建设相关的开支并满足不需要持续供应新问题,可以使用自动问题(QG)技术。但是,与自动问题应答(QA)相比,QG是一个更具挑战性的任务。在这项工作中,我们在QA,QG的多任务设置中微调多语言T5(MT5)变压器,并使用土耳其QA DataSet回答提取任务。据我们所知,这是第一个尝试从土耳其语文本执行自动文本到文本问题的学术工作。评估结果表明,拟议的多任务设置达到了最先进的土耳其语问题应答和问题绩效,而不是TQuadv1,TQuadv2数据集和XQuad土耳其分裂。源代码和预先训练的模型可在https://github.com/obss/turkish-question-generation中获得。
translated by 谷歌翻译
Adversarial training is an effective approach to make deep neural networks robust against adversarial attacks. Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD. High adversarial robustness can also arise if an attack fails to find adversarial gradient directions, a phenomenon known as `gradient masking'. In this work, we analyse the effect of label smoothing on adversarial training as one of the potential causes of gradient masking. We then develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA). Our attack approach is based on a `match and deceive' loss that finds optimal adversarial directions through guidance from a surrogate model. Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size. Furthermore, our proposed G-PGA is generic, thus it can be combined with an ensemble attack strategy as we demonstrate for the case of Auto-Attack, leading to efficiency and convergence speed improvements. More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
translated by 谷歌翻译
Today's software is bloated leading to significant resource wastage. This bloat is prevalent across the entire software stack, from the operating system, all the way to software backends, frontends, and web-pages. In this paper, we study how prevalent bloat is in machine learning containers. We develop MMLB, a framework to analyze bloat in machine learning containers, measuring the amount of bloat that exists on the container and package levels. Our tool quantifies the sources of bloat and removes them. We integrate our tool with vulnerability analysis tools to measure how bloat affects container vulnerabilities. We experimentally study 15 machine learning containers from the official Tensorflow, Pytorch, and NVIDIA container registries under different tasks, (i.e., training, tuning, and serving). Our findings show that machine learning containers contain bloat encompassing up to 80\% of the container size. We find that debloating machine learning containers speeds provisioning times by up to $3.7\times$ and removes up to 98\% of all vulnerabilities detected by vulnerability analysis tools such as Grype. Finally, we relate our results to the larger discussion about technical debt in machine learning systems.
translated by 谷歌翻译
We consider the problem of improving the human instance segmentation mask quality for a given test image using keypoints estimation. We compare two alternative approaches. The first approach is a test-time adaptation (TTA) method, where we allow test-time modification of the segmentation network's weights using a single unlabeled test image. In this approach, we do not assume test-time access to the labeled source dataset. More specifically, our TTA method consists of using the keypoints estimates as pseudo labels and backpropagating them to adjust the backbone weights. The second approach is a training-time generalization (TTG) method, where we permit offline access to the labeled source dataset but not the test-time modification of weights. Furthermore, we do not assume the availability of any images from or knowledge about the target domain. Our TTG method consists of augmenting the backbone features with those generated by the keypoints head and feeding the aggregate vector to the mask head. Through a comprehensive set of ablations, we evaluate both approaches and identify several factors limiting the TTA gains. In particular, we show that in the absence of a significant domain shift, TTA may hurt and TTG show only a small gain in performance, whereas for a large domain shift, TTA gains are smaller and dependent on the heuristics used, while TTG gains are larger and robust to architectural choices.
translated by 谷歌翻译
Generalizable 3D part segmentation is important but challenging in vision and robotics. Training deep models via conventional supervised methods requires large-scale 3D datasets with fine-grained part annotations, which are costly to collect. This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model, GLIP, which achieves superior performance on open-vocabulary 2D detection. We transfer the rich knowledge from 2D to 3D through GLIP-based part detection on point cloud rendering and a novel 2D-to-3D label lifting algorithm. We also utilize multi-view 3D priors and few-shot prompt tuning to boost performance significantly. Extensive evaluation on PartNet and PartNet-Mobility datasets shows that our method enables excellent zero-shot 3D part segmentation. Our few-shot version not only outperforms existing few-shot approaches by a large margin but also achieves highly competitive results compared to the fully supervised counterpart. Furthermore, we demonstrate that our method can be directly applied to iPhone-scanned point clouds without significant domain gaps.
translated by 谷歌翻译
多项式网络(PNS)最近在面部和图像识别方面表现出了有希望的表现。但是,PNS的鲁棒性尚不清楚,因此获得证书对于使其在现实世界应用中的采用至关重要。基于分支和绑定(BAB)技术的Relu神经网络(NNS)上的现有验证算法不能微不足道地应用于PN验证。在这项工作中,我们设计了一种新的边界方法,该方法配备了BAB,用于全球融合保证,称为VPN。一个关键的见解是,我们获得的边界比间隔结合的传播基线更紧密。这可以通过MNIST,CIFAR10和STL10数据集的经验验证进行声音和完整的PN验证。我们认为我们的方法对NN验证具有自身的兴趣。
translated by 谷歌翻译